Building on Redundancy: Factoid Question Answering, Robust Retrieval and the "Other"
نویسندگان
چکیده
We have explored how redundancy based techniques can be used in improving factoid question answering, definitional questions (“other”), and robust retrieval. For the factoids, we explored the meta approach: we submit the questions to the several open domain question answering systems available on the Web and applied our redundancy-based triangulation algorithm to analyze their outputs in order to identify the most promising answers. Our results support the added value of the meta approach: the performance of the combined system surpassed the underlying performances of its components. To answer definitional (“other”) questions, we were looking for the sentences containing re-occurring pairs of noun entities containing the elements of the target. For robust retrieval, we applied our redundancy based Internet mining technique to identify the concepts (single word terms or phrases) that were highly related to the topic (query) and expanded the queries with them. All our results are above the mean performance in the categories in which we have participated, with one of our robust runs being the best in its category among all 24 participants. Overall, our findings support the hypothesis that using as much as possible textual data, specifically such as mined from the World Wide Web, is extre mely promising. FACTOID QUESTION ANSWERING The Natural Language Processing (NLP) task, which is behind Question Answering (QA) technology, is known to be Artificial Intelligence (AI) complete: it requires the computers to be as intelligent as people, to understand the deep semantics of human communication, and to be capable of common sense reasoning. As a result, different systems have different capabilities. They vary in the range of tasks that they support, the types of questions they can handle, and the ways in which they present the answers. By following the example of meta search engines on the Web (Selberg & Etzioni, 1995), we advocate combining several fact seeking engines into a single “Meta” approach. Meta search engines (sometimes called metacrawlers) can take a query consisting of keywords (e.g. “Rotary engines”), send them to several portals (e.g. Google, MSN, etc.), and then combine the results. This allows them to provide better coverage and specialization. The examples are MetaCrawler (Selberg & Etzioni, 1995), 37.com (www.37.com), and Dogpile (www.dogpile.com). Although, the keyword based meta search engines have been suggested and explored in the past, we are not aware of the similar approach tried for the task of open domain/corpus question answering (fact seeking). The practical benefits of the meta approach are justified by general consideration: eliminating “weakest link” dependency. It does not rely on a single system which may fail or may simply not be designed for a specific type of tasks (questions). The meta approach promises higher coverage and recall of the correct answers since different QA engines may cover different databases or different parts of the Web. In addition, the meta approach can reduce subjectivity by querying several engines; like in the real-world, one can gather the views from several people in order to make the answers more accurate and objective. The speed provided by several systems queried in parallel can also significantly exceed those obtained by working with only one system, since their responsiveness may vary with the task and network traffic conditions. In addition, the meta approach fits nicely into a becoming-popular Web services model, where each service (QA engine) is independently developed and maintained and the meta engine integrates them together, while still being organizationally independent from them. Since each engine may be provided by a commercial company interested in increasing their advertising revenue or a research group showcasing their cutting edge technology, the competition mechanism will also ensure quality and diversity among the services. Finally, a meta engine can be customized for a particular portal such as those supporting business intelligence, education, serving visually impaired or mobile phone users. Figure 1. Example of START output. Figure 2. Example of Btainboost output. Meta Approach Defined We define a fact seeking meta engine as the system that can combine, analyze, and represent the answers that are obtained from several underlying systems (called answer services throughout our paper). At least some of these underlying services (systems ) have to be capable of providing candidate answers to some types of questions asked in a natural language form, otherwise the overall architecture would not be any different from a single fact seeking engine which are typically based on a commercial keyword search engines, e.g. Google. The technology behind each of the answer services can be as complex as deep semantic NLP or as simple as shallow pattern matching. Fact Seeking Service Web address Output Format Organization/System Performance in our evaluation (MRR) START start.csail.mit.edu Single answer sentence Research Prototype 0.049** AskJeeves www.ask.com Up to 200 ordered snippets Commercial 0.397** BrainBoost www.brainboost.com Up to 4 snippets Commercial 0.409* ASU QA on the Web qa.wpcarey.asu.edu Up to 20 ordered sentences Research Prototype 0.337** Wikipedia en.wikipedia.org Narrative Non profit 0.194** ASU Meta QA http://qa.wpcarey.asu.edu/ Precise answer Research Prototype 0.435 Table 1. The fact seeking services involved, their characteristics and performances in the evaluation on the 2004 questions. * and ** indicate 0.1 and .05 levels of statistical significance of the difference from the best accordingly. Challenges Faced and Addressed Combing multiple fact seeking engines also faces several challenges. First, the output formats of them may differ : some engines produce exact answer (e.g. START), some other present one sentence or an entire snippet (several sentences) simi lar to web search engines, as shown in Figures 1-4. Table 1 summarizes those differences and other capabilities for the popular fact seeking engines. Second, the accuracy of responses may differ overall and have even higher variability depending on a specific type of a question. And finally, we have to deal with multiple answers, thus removing duplicates, and resolving answer variations is necessary. The issues with merging search results from multiple engines have been already explored by MetaCrawler (Selberg & Etzioni, 1995) and fusion studies in information retrieval (e.g. Vogt & Cottrell, 1999) but only in the context or merging lists of retrieved text documents. We argue that the task of fusing multiple short answers, which may potentially conflict or confirm each other, is fundamentally different and poses a new challenge for the researchers. For example, some answer services (components) may be very precise (e.g. START), but cover only a small proportion of questions. They need to be backed up by less precise services that have higher coverage (e.g. AskJeeves). However, backing up may easily result in diluting the answer set by spurious (wrong) answers. Thus, there is a need for some kind of triangulation of the candidate answers provided by the different services or multiple candidate answers provided by the same service. Figure 3. Example of Ask Jeeves output. Figure 4. Example of ASU QA output. Triangulation, a term which is widely used in intelligence and journalism, stands for confirming or disconfirming facts, by using multiple sources. Roussinov et al. (2004) went one step further than using the frequency counts explored earlier by Dumais et al. (2002) and groups involved in TREC competitions. They explored a more fine-grained triangulation process which we also used in our prototype. Their algorithm can be demonstrated by the following intuitive example. Imagine that we have two candidate answers for the question “What was the purpose of the Manhattan Project?”: 1) “To develop a nuclear bomb” 2) “To create an atomic weapon”. These two answers support (triangulate) each other since they are semantically similar. However, a straightforward frequency count approach would not pick this similarity. The advantage of triangulation over simple frequency counting is that it is more powerful for less “factual” questions, such as those that may allow variations in the correct answers. In order to enjoy the full power of triangulation with factoid questions (e.g. Who is the CEO of IBM?), the candidate answers have to be extracted from their sentences (e.g. Samuel Palmisano), so they can be more accurately compared with the other candidate answers (e.g. Sam Palmisano). That is why the meta engine needs to possess answer understanding capabilities as well, including such crucial capability as question interpretation and semantic verification of the candidate answers to check that they belong to a desired category (person in the example above). Figure 5. The Meta approach to fact seeking. Fact Seeking Engine Meta Prototype: Underlying Technologies and Architecture In the first version of our prototype, we included several freely available demonstrational prototypes and popular commercial engines on the Web that have some QA (fact seeking) capabilities, specifically START, AskJeeves, BrainBoost and ASU QA (Table 1, Figures 1-4). We also added Wikipedia to the list. Although it does not have QA capabilities, it provides good quality factual information on a variety of topics, which adds power to our triangulation mechanism. Google was not used directly as a service but BrainBoost and ASU QA are already using it among the other major keyword search engines. The meta-search part of our system was based on the MetaSpider architecture (Chau et al., 2001; Chen et al., 2001). Multithreads are launched to submit the query to fetch the candidate answers from each service. After these results are obtained, the system performs answer extraction, triangulation and semantic verification of the results, based on the algorithms from Roussinov et al. (2004). Figure 5 summarizes the overall process. For the TREC competition, we applied the answer projection algorithm, same as last year, that tried to find the best supporting document within the TREC collection (Aquaint) by matching the words from the question and the target. We have been maintaining a working prototype on the web (http://qa.wpcarey.asu.edu/) since August 2004 and have already accumulated 1000+ questions that we can use to test our future research hypothesis and fine-tune our algorithms. The prototype has been featured in Information Week (Claburn, 2005) as one of the promising directions in the “Web Search of Tomorrow.” Testing on 2004 questions Before the answer submission deadline this year, we fine-tuned the weights given to the underlying answer services and evaluated our meta approach. We used the set of 200 test questions and regular expression answer keys from the QuestionAnswering Track of the TREC 2004 conference (Voorhees and Buckland, 2004). Although various metrics have been explored in the past, we used mean reciprocal rank (MRR) of the first correct answer as in the TREC-s 2001, 2002 and in Dumais et al. (2002). This metric assigns a score of 1 to the question if the first answer is correct. If only the second answer is correct, the score is 1⁄2, the third correct results in 1/3, etc. The drawback of this metric is that it is not the most sensitive since it only considers the first correct answer, ignoring what follows. However, it is still more sensitive than the TREC 2004 Figure 5. Fact Seeking Meta Engine: How it Works What is ... Who was... Where is... ... Factoid Questions : Who is the CEO of IBM?
منابع مشابه
ارایه یک پیکره پرسش و پاسخ مذهبی در زبان فارسی
Question answering system is a field in natural language processing and information retrieval noticed by researchers in these decades. Due to a growing interest in this field of research, the need to have appropriate data sources is perceived. Most researches about developing question answering corpus area have been done in English so far, but in other languages as Persian, the lack of these co...
متن کاملBoosting Passage Retrieval through Reuse in Question Answering
Question Answering (QA) is an emerging important field in Information Retrieval. In a QA system the archive of previous questions asked from the system makes a collection full of useful factual nuggets. This paper makes an initial attempt to investigate the reuse of facts contained in the archive of previous questions to help and gain performance in answering future related factoid questions. I...
متن کاملA Lexical Approach for Spanish Question Answering
This paper discusses our system’s results at the Spanish Question Answering task of CLEF 2007. Our system is centered in a full data-driven approach that combines information retrieval and machine learning techniques. It mainly relies on the use of lexical information and avoids any complex language processing procedure. Evaluation results indicate that this approach is very effective for answe...
متن کاملEvaluation of baseline information retrieval for Polish open-domain Question Answering system
We report on our efforts aimed at building an Open Domain Question Answering system for Polish. Our contribution is twofold: we gathered a set of question–answer pairs from various Polish sources and we performed an empirical evaluation of two re-ranking methods. The gathered collection contains factoid, list, non-factoid and yes-no questions, which makes a challenging material for experiments....
متن کاملA Pattern Based Approach to Answering Factoid, List and Definition Questions
Finding textual answers to open-domain questions in large text collections is a difficult problem. In this paper we concentrate on three types of questions: factoid, list, and definition questions and present two systems that make use of regular expression patterns and other devices in order to locate possible answers. While the factoid and list system acquires patterns in an off-line phase usi...
متن کاملDFKI-LT at QAST 2007: Adapting QA Components to Mine Answers in Speech Transcripts
The paper describes QAst-v1 a robust question answering system for answering factoid questions in manual and automatic transcriptions of speech. Our system is an adaptation of our text–based crosslingual open–domain QA system that we used for the Clef main tasks. In particular we assume that good answer candidates to factoid questions are named entities which are type–compatible with the expect...
متن کامل